Add Two Gateway Controllers to Postgres Test Suite to always test based on eventing#1885
Conversation
📝 WalkthroughSummaryThis PR adds a two-controller Postgres test topology to the gateway integration test suite to enable testing based on eventing. The changes establish a runtime-facing controller replica alongside the primary gateway controller within the Postgres-backed test environment. Key ChangesTest Infrastructure Updates:
Configuration and Monitoring:
ImpactThe test suite now validates the two-controller Postgres setup with proper xDS synchronization probing and Postgres-backed replica coordination, ensuring the system works correctly in high-availability configurations using event-based synchronization. WalkthroughThis pull request introduces a two-controller topology for PostgreSQL-backed gateway integration tests. A new runtime-facing gateway-controller service is added to the Docker Compose configuration to handle policy-chain xDS synchronization, separate from the management REST controller. An environment variable Sequence DiagramsequenceDiagram
participant GitHubWorkflow
participant Makefile
participant SuiteInit as InitializeTestSuite
participant Detector as policySnapshotControllerAdminURL
participant Config
participant HealthProbe as waitForPolicySnapshotSync
participant RuntimeCtrl as gateway-controller-xds
GitHubWorkflow->>Makefile: IT_GATEWAY_CONTROLLER_RUNTIME=true
Makefile->>SuiteInit: make test-integration
SuiteInit->>Detector: Check IT_GATEWAY_CONTROLLER_HA
alt Runtime Enabled
Detector->>Config: Set PolicySnapshotControllerAdminURL
Config->>HealthProbe: Provide runtime controller URL
HealthProbe->>RuntimeCtrl: Probe /xds_sync_status
else Runtime Disabled
Detector->>Config: Use default GatewayControllerAdminURL
Config->>HealthProbe: Provide management controller URL
end
HealthProbe->>RuntimeCtrl: Verify policy-chain sync
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.12.2)level=error msg="[linters_context] typechecking error: pattern ./...: directory prefix . does not contain modules listed in go.work or their selected dependencies" Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
12021fb to
006c7e5
Compare
d374518 to
9478464
Compare
…ed on eventing Update workflow file to include IT_GATEWAY_CONTROLLER_RUNTIME=true flag
9478464 to
80fd466
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@gateway/it/setup.go`:
- Around line 54-58: CheckPortsAvailable() currently omits the new
GatewayControllerRuntimeAdminPort ("9093"), so pre-flight port validation won't
catch conflicts for the runtime admin port; update the CheckPortsAvailable
function to include GatewayControllerRuntimeAdminPort (value "9093") in its
list/set of ports to check (the same place where other gateway ports are listed)
so the validation fails fast before docker-compose startup.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6e37c36c-20b4-4611-9be9-290762c0bf6c
📒 Files selected for processing (8)
.github/workflows/gateway-integration-test-postgres.ymlgateway/it/Makefilegateway/it/db_helpers.gogateway/it/docker-compose.test.postgres.yamlgateway/it/setup.gogateway/it/state.gogateway/it/steps_health.gogateway/it/suite_test.go
…fast in case of failure
| # Runtime-facing gateway-controller replica. The integration tests continue to | ||
| # send management REST calls to gateway-controller above; this controller only | ||
| # serves xDS to gateway-runtime after synchronizing through Postgres/EventHub. | ||
| gateway-controller-runtime: |
There was a problem hiding this comment.
This service name confused me at first glance. Because we already have gateway-runtime. What if we rename this to gateway-controller-dataplane or gateway-controller-xds?
| gateway-controller-runtime: | |
| gateway-controller-dataplane: |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
gateway/it/docker-compose.test.postgres.yaml (1)
75-76:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winMonitor test stability after 50% memory reduction.
The management controller's memory has been halved from 1000m to 500m. While the xDS serving load is now handled by
gateway-controller-xds, the management controller still processes REST API calls, subscription events, and Postgres writes. Verify that 500m is sufficient under test load.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@gateway/it/docker-compose.test.postgres.yaml` around lines 75 - 76, Memory for the management controller has been reduced by changing mem_limit and mem_reservation from 1000m to 500m; run the test suite and a simulated load (REST calls, subscription events, Postgres writes) against the management controller and monitor for OOMs, increased latency, failed subscriptions, or DB write errors; if stability/regressions appear, restore mem_limit/mem_reservation to 1000m or incrementally increase until stable, or alternatively move heavier workload(s) to gateway-controller-xds and ensure resource limits/requests are adjusted accordingly.
🧹 Nitpick comments (1)
gateway/it/Makefile (1)
65-65: 💤 Low valueEnvironment variable wiring is correct.
The
IT_GATEWAY_CONTROLLER_HA=trueflag correctly propagates throughdb_helpers.policySnapshotControllerAdminURL()→suite_test.goinitialization →steps_health.waitForPolicySnapshotSync()to target the runtime controller's admin port (9093).💡 Optional: Consider renaming for clarity
The
IT_GATEWAY_CONTROLLER_HAname suggests high availability (redundancy/failover), but this topology implements a split management/runtime controller pattern. Consider renaming toIT_GATEWAY_CONTROLLER_RUNTIMEorIT_GATEWAY_CONTROLLER_SPLITfor clearer intent, unless "HA" is intentional for future high-availability plans.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@gateway/it/Makefile` at line 65, The environment var IT_GATEWAY_CONTROLLER_HA is correctly wired but its name may be misleading; if you want clarity rename the flag and all references (Makefile target line, usages in db_helpers.policySnapshotControllerAdminURL(), suite_test.go initialization, and steps_health.waitForPolicySnapshotSync()) to a clearer identifier like IT_GATEWAY_CONTROLLER_RUNTIME or IT_GATEWAY_CONTROLLER_SPLIT, ensuring you update every occurrence and related docs/tests so behavior is unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@gateway/it/docker-compose.test.postgres.yaml`:
- Around line 75-76: Memory for the management controller has been reduced by
changing mem_limit and mem_reservation from 1000m to 500m; run the test suite
and a simulated load (REST calls, subscription events, Postgres writes) against
the management controller and monitor for OOMs, increased latency, failed
subscriptions, or DB write errors; if stability/regressions appear, restore
mem_limit/mem_reservation to 1000m or incrementally increase until stable, or
alternatively move heavier workload(s) to gateway-controller-xds and ensure
resource limits/requests are adjusted accordingly.
---
Nitpick comments:
In `@gateway/it/Makefile`:
- Line 65: The environment var IT_GATEWAY_CONTROLLER_HA is correctly wired but
its name may be misleading; if you want clarity rename the flag and all
references (Makefile target line, usages in
db_helpers.policySnapshotControllerAdminURL(), suite_test.go initialization, and
steps_health.waitForPolicySnapshotSync()) to a clearer identifier like
IT_GATEWAY_CONTROLLER_RUNTIME or IT_GATEWAY_CONTROLLER_SPLIT, ensuring you
update every occurrence and related docs/tests so behavior is unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 291204d7-ddb2-4dd7-8370-68e681b17264
📒 Files selected for processing (8)
.github/workflows/gateway-integration-test-postgres.ymlgateway/it/Makefilegateway/it/db_helpers.gogateway/it/docker-compose.test.postgres.yamlgateway/it/setup.gogateway/it/state.gogateway/it/steps_health.gogateway/it/suite_test.go
✅ Files skipped from review due to trivial changes (1)
- gateway/it/suite_test.go
🚧 Files skipped from review as they are similar to previous changes (5)
- gateway/it/setup.go
- .github/workflows/gateway-integration-test-postgres.yml
- gateway/it/steps_health.go
- gateway/it/db_helpers.go
- gateway/it/state.go
Purpose
Goals
Approach
User stories
Documentation
Automation tests
Security checks
Samples
Related PRs
Test environment